Learning to Optimize via Information-Directed Sampling

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning to Optimize via Information-Directed Sampling

This paper proposes information directed sampling–a new algorithm for balancing between exploration and exploitation in online optimization problems in which a decision-maker must learn from partial feedback. The algorithm quantifies the amount learned by selecting an action through an information theoretic measure: the mutual information between the true optimal action and the algorithm’s next...

متن کامل

Learning to Optimize via Posterior Sampling

Full terms and conditions of use: http://pubsonline.informs.org/page/terms-and-conditions This article may be used only for the purposes of research, teaching, and/or private study. Commercial use or systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisher approval. For more information, contact [email protected]. The Publisher does not warr...

متن کامل

Information Directed reinforcement learning

Efficient exploration is recognized as a key difficulty in reinforcement learning. We consider an episodic undiscounted MDP where the goal is to minimize the sum of regrets over different episodes. Classical methods are either based on optimism in the face of uncertainty or on probability matching. In this project we explore an approach that aims at quantifying the cost of exploration while rem...

متن کامل

A Note on Information-Directed Sampling and Thompson Sampling

This note introduce three Bayesian style Multi-armed bandit algorithms: Information-directed sampling, Thompson Sampling and Generalized Thompson Sampling. The goal is to give an intuitive explanation for these three algorithms and their regret bounds, and provide some derivations that are omitted in the original papers.

متن کامل

Learning to Optimize Plan Execution in Information Agents

We can build software agents to perform a wide variety of useful information gathering and monitoring tasks on the Web [1]. For example, in the travel domain, we can construct agents to notify you of flight delays in real time, monitor for schedule and price changes, and even send a fax to a hotel if your flight is delayed to ensure that your hotel room will not be given away [2,3]. To perform ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Operations Research

سال: 2018

ISSN: 0030-364X,1526-5463

DOI: 10.1287/opre.2017.1663